OTIC  FILE  COP.Y  ad  A12726  1 


■1 


NAVAL  POSTGRADUATE  SCHOOL 

Monterey,  California 


THESIS 


p'^*8 


THE  METHODOLOGY  OF 

COST  ESTIMATION 

I  FOR  U.S.  MISSILES 

By 

Kyung  Ho 

Choo 

December 

1982 

Thesis  Advisor: 

M.G.  Sovereign 

Approved  for  public  release;  distribution  unlimited 

gS  04  25  l0ft 

REPRODUCED  FROM 
BEST  AVAILABLE  COPY 


lECUMTY  CLASSIFICATION  OF  TKII  RARE  WM  Data  BRWaaD 


1  REPOUT  DOCUMENTATION  PAGE 

HEAD  INSTRUCTIONS 

BEFORE  COMPLETING  FORM 

A.  RCCtRlCNT’l  CATALOG  NUM4CR 

4 

4.  TITUI  (ftp*  SvHitf) 

The  Methodology  of  Cost  Estimation  for 
U.S.  Missiles 

*•  TYPE  01  PCPOPT  ft  PCPtoO  COVE  PC  0 

Master's  Thesis; 
December  1982 

•  ■  RERFORMINQ  040.  NERORT  NUMBER 

7.  AUTmOR.'W 

Kyung  Ho  Choo 

(.  CONTRACT  OR  CRAnT  NUHBCRrtj 

4.  RCRFORMINO  OROANI t  ATtON  NAME  ANO  AOONCII 

Naval  Postgraduate  School 

Monterey,  California  93940 

1  1.  CON  TNOLLINO  OFFICE  NAMC  ANO  ADDRESS 

Naval  Postgraduate  School 

Monterey,  California  93940 

13.  PCPOPT  DATE 

December  1982 

11*  NUMIIR  or  PAGES 

S3 

hhhhhhi hi 

it.  security  class,  a  mi*  ,.»•«> 

Unclassified 

ita.  oc Cl assi ai cation/  oorncraoing 
schedule 

u.  otirmauTioM  statcmknt  et  **• 

Approved  for  public  release;  distribution  unlimited 

17.  OlSTftiiUTlOM  STATCMtNT  (ml  tOm  mhmtrmct  In  BimmO  30 ,  If  Httimmt  hmm  Hmmmrt) 

?•'.  SUPPLEMENT ANY  NOTH 

19.  KEY  WOPOS  fCffilliNM  mm  rmrmmmm  ml 4m  if  nmmmmmmrr  mm4  lOmmtltr  Or  OlmmO  miMfJ 

CER  Prediction  Interval 

Adjustment  Standard  Error 

Constant  Dollar  Learning  Curve 

Regression 

Explanatory  Variable 

>0.  ABSTRACT  rCanlMua  mt  raaaraa  •If  II  "•••••my  m4  Ifmiltr  ff  Slaa*  ««H|) 

Public  data  on  U.S.  missile  system  are  used  to  demonstrate 
the  procedures  and  techniques  for  development  of  Cost  Estimating 
Relationships(CER)  by  statistical  methods.  First,  attention  is 
given  to  data  adjustment  for  constant  dollars  and  quantities 
since  the  data  come  from  yearly  budgets.  Next,  simple  and 
multiple  linear  regressions  are  performed  in  various  combinations 
of  the  three  explanatory  variables  (weight,  speed  and  range). 

DD 


rOKH 

I  JAM  TS 


1473 


coition  of  <  nov  ••  it  oatourrc 

S/N  0  10  2*014*  440  I  I 


UNCLASSIFIED 


SICUNITY  CL  AMI  NIC  AT  ION  Of  TMI»  RAOE  (tmrni  0««  *•>(•»  •<!) 


1 


MMlt¥  CUIjlgiiittfg 


20.  (continued) 

Learning  curves  are  introduced  to  derive  the  reduction  in  cost 
as  the  number  of  items  produced  increases. 


Ac o#aclan  For _  a 

“utth  cn  am  ST 

DT1C  T.-3  D 

UnMiuianncad  O 

JuatlflCfttlOft - 

By - - - 

Distribution/ 
Availability  Cooe3 
Avail  and/or 
Tlr.t  !  Special 


(  COP* 

\  NSPICTEOy 


t 


DD  Form  1473 
S/?J  ttli-0 14-6601 


UNCLASSIFIED 

«w  *•**«**••«  •* fmi*  **•«'*••*•  »•••'•*• 


Approved  for  public  release,  distribution  unlimited 


The  Methodology  of  Cost  Estimation  for  O.S.  Missil-s 


by 


Kyung  Ho  Choo 
Captain,  Korean  Army 
3. A .  ,  Korean  Military  Academy,  1975 


Submitted  ir.  partial  fulfillment  of  the 
requirements  for  the  degree  of 

MASTER  OF  SCIENCE  IN  MANAGEMENT 

from  the 

NAVAL  POSTGRADUATE  SCHOOL 
December,  1982 


Dean  of  Information  and  Policy  Sciences 


3 


REPRODUCED  FROM 
BEST  AVAILABLE  COPY 


ABSTR  ACT 
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Public  data  on  U.S.  aissile  systeas  are  used  to  dsmons- 
trate  the  procedures  ani  techniques  for  development  of  Cost 
Estimating  Relationships  (CER)  by  statistical  methods. 
First,  attention  is  given  to  data  adjustment  for  constant 
dollars  and  quantities  since  tha  data  come  from  yearly 
budgets.  Next,  simpla  and  multiple  linear  regressions  are 
performed  in  various  combinations  of  three  explanatory  vari¬ 
ables  (weight,  speed  and  range*.  Learning  curves  are  intro¬ 
duced  to  derive  the  reduction  in  cost  as  the  number  of  items 
produced  increases. 
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A  cost  estimate  is  a  judgement  or  opinion  regarding  the 
future  cost  of  an  cVect,  ccmmodi ty,  or  service  [Bef.  1: 
p.1  ].  In  particular.  La  this  thesis  it  is  the  cost  of 
missiles.  This  judgement  or  opinion  may  be  arrived  at 
formally  or  informally  by  a  variety  of  mathods,  all  or  which 
ara  based  on  the  assumption  that  experience  is  a  reliable 
guide  tc  the  future.  Ii  some  cases  the  guidance  is  clear 
and  unequivocal.  In  others,  it  is  not.  Much,  perhaps  most, 
estimating  involves  the  relationship  between  past  experience 
and  future  application.  The  more  interesting  problems  are 
those  in  which  the  relationship  is  unclear,  because  the 
proposed  item  differs  in  some  significant  way  from  its 
predecessors.  The  challenge  to  cost  analysts  concerned  with 
military  hardware  is  to  project  from  the  known  to  unknown, 
for  example,  tc  use  experience  on  existing  missiles  to 
predict  the  cost  of  the  next-generation  missile.  The 

techniques  used  for  estimating  aardware  cost  range  from 
intuition  at  one  extreme  to  a  very  detailed  "bottom-up" 
application  of  labor  and  material  industrial  engineering 
standards  at  the  other.  There  are  nany  methods  to  estimate 
costs,  but  this  thesis  will  discuss  only  the  statistical 
approach  to  estimating  the  cost  of  J.3.  missiles. 


In  the  statistical  approach. 


estimating  relationships 


that  use  explanatory  varianles  such  as  weight,  speed,  range, 
and  thrust  are  relied  anon  to  predict  the  cost  at  a  high 
level  of  aggregation,  either  the  missile  itself  or  major 
suosystems.  To  say  that  statistical  techniques  car.  be  used 
in  a  variety  of  situations  does  not  imply  that  tae 
techniques  are  the  same  for  ail  situations.  They  will  vary 
according  to  the  purpose  of  the  study  and  the  information 
available. 

In  a  conceptual  stoiy,  it  is  necessary  to  have  a 
procedure  for  estimating  the  total  expected  cost  cf  a 
program,  and  this  must  include  an  allowance  for  the 
contingencies  and  unforeseen  changes  that  seem  to  te  an 
innerer.t  part  of  most  development  and  production  programs. 
In  effect,  this  procedure  merely  asserts  the  obvious:  as 
more  is  known,  fewer  assumptions  are  required.  When  enough 
is  known,  and  this  means  waen  a  product  is  weli  into 
production,  accounting  '.a  formation  and  data  can  be  *-a  ken 
directly  from  records  of  account  and  used  with  a  minimum  of 
statistical  manipulation,  i.e.,  only  the  adjustment  for 
cmnge  in  "learning”  and  inflation  as  the  systems  are 
produced.  This  technique  is  useful  on  those  cases  wnen  the 
future  product  or  activity  under  consideration  is 
essentially  the  same  as  that  for  the  oast  or  current  period, 
which  is  often  not  the  case.  3ut  ill  new  miseries  vary  in 
their  characteristic  parameters. 
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In  any  situation  tho  estimating  procedure  to  be  used 
should  be  determined  by  the  data  available,  the  purpose  of 
estimate,  and,  to  an  extent,  by  such  other  factors  as  the 
time  available  to  make  an  estimate.  In  fact,  since  the  life 
of  a  modern  weapon  system  may  run  twenty  years  (or  longer), 
the  investment  needed  to  establisa  a  new  system  may  be 
dwarfed  by  the  costs  required  to  operate  and  maintain  it. 


II 
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The  analysis  of  past  cost  data  yields  estimates  of 
future  costs  based  on  tae  cos’:  relationships  of  previous 
periods.  The  degree  to  which  these  data  are  appropriate 
depends  upon  th9  extent  to  which  cost  behavior  in  the  future 
will  correspond  to  that  in  the  past  and  to  the  extent  w? 
idsntify  the  relationships.  If  the  change  being  considered 
is  extensive  enough  to  bring  ao  out  changes  ir.  toe  underlying 
cost  structure  such  as  the  use  of  new  technology,  the 
unadjusted  historical  cost  dati  may  oe  inappropriate. 

A.  I  AT  A  COLLECTION 

There  are  three  steps  in  data  collection. 

1 .  £§ta  g o  \  lection 

"Data  collection  is  the  process  of  identifying, 
searching  out,  acquiring,  verifying,  and  recording  the 
specific  information  that  is  of  value  to  the  analyst." 
[Ref.  2:  p.11]  The  cost  inaiysts  have  many  data  sources. 

The  cost  information  report  (CZR)  was  established  by  DOD  in 
1956  to  simplify  the  data  collection  problem.  This  reporting 
system  was  designed  to  collect  cost  and  related  data  on 
major  contracts  for  aircraft,  missiles,  and  space  programs. 
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Efforts  are  presantiy  underway  to  enlarge  the  coverage  of 


the  CIR's  to  other  areas  of  defense  contracting.  The  new 
system  is  called  contract  cost  data  reporting,  (CCDR) .  The 
reports  ara  sent  by  contractors  to  the  OSD's  cost  analysis 
improvement  group  (CAIG) .  In  the  aosence  of  TIR-type  data, 
the  analyst  must  resort  to  contractor  records,  such  as  the 
cost  performance  reports  (3  PR) ,  sent  to  government  program 
managers,  engineering  records,  managerial  records,  and  other 
periodical  reports  containing  cost  data  such  as  the  CSFR 
(Contractors  Status  of  Fund  Report).  But  for  sub-systems, 
this  type  of  data  is  not  necessarily  available  to  the 
analyst. 

While  collecting  data,  the  analyst  should  keep  in 
mind  the  levels  of  accuracy  and  aggregation  that  he  needs. 
If  cost  data  is  available  down  to  toe  component  level,  it 
may  be  possible  to  proceed  with  a  disaggregated  method  of 
cost  estimating,  estimating  each  coaponent  and  then  aggre¬ 
gating.  The  advantage  is  no  matter  what  approach  is  used, 
data  collection  problems  cm  be  minimized  by  first  becoming 
familiar  with  the  system's  technology  and  second,  by  using 
consistent  definitions  for  the  cost  and  parameteric 

variables.  Por  example  there  are  at  least  three  different 

• 

types  of  historical  data  required  to  develop  a  statistical 
cost-estimating  procedure.  First,  there  are  the  resource 
data,  usually  in  the  form  of  expenditures  and  labor  hours. 
It  is  customary  to  apply  the  word  cost  to  both,  and  that 
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practice  is  followed  throughout  this  thesis.  i  second  type 
of  data  describes  the  possible  cost -explanatory  elements; 
for  hardware  such  as  aircraft  and  missile  this  means  perfor¬ 
mance  and  physical  characteristics.  The  third  type  is 
program  data#  i.e.,  iaformation  reLated  to  the  development 
and  production  history  of  past  hardware  programs. 

a.  Resource  Data 

Resource  data  are  generally  classified  under 
end-item  categories  or  functional  categories.  *n  example  of 
the  former  in  various  possible  levels  of  detail  are  system, 
subsystem#  component,  and  part.  The  functional  cost  catego¬ 
ries,  such  as  engineering,  tooling,  manufacturing,  quality 
control,  purchased  equipment,  are  lsually  broken  down  into 
co3 1-  elements-- labor#  material,  overhead,  and  other  direct 
charges.  The  data  source  is  tne  contractor's  plant. 
Generally,  the  accounting  systems  will  vary  from  one  company 
to  another  and  the  amount  of  detail  is  immense.  Theoretical 
considerations  aside,  estimating  tec.nr.iques  must  be  based  on 
whatever  resouce  data  the  analyst  can  find,  and  in  the  past 
the  availability  of  data  has  varied  from  one  kind  of  equip¬ 
ment  tc  another.  The  most  data  is  given  in  the  CPR  which 
is  currently  available  only  by  going  to  each  project's 
management. 
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b.  Physical  and  performance  characteristics 


Information  about  the  piysical  and  performance 
characteristics  of  missile  system  is  just  as  important  as 
resource  data.  Data  collection  in  this  area  can  be  time- 
consuming,  particularly  since  it  is  not  often  clear  in 
advance  vhat  data  will  be  requirad.  The  goal,  of  course,  is 
to  obtain  a  list  of  those  characteristics  that  best  explain 
difference  in  cost.  Weigit  is  a  coaaonly  used  explanatory 
variable,  but  weight  alone  is  seldom  enough;  speed  is  almost 
always  included  as  a  second  explanatory  variable  for 
missiles  or  aircraft.  But  speed  is  often  useful  only  at  the 
total  system  level. 

c.  Program  Data 

0 

\  third  type  of  essential  data  is  drawn  from  the 
development  and  production  history  of  hardware  items.  The 
acceptance  data  of  the  item,  the  significant  milestones  in 
the  development  program,  the  production  rates,  and  the 
occurence  of  major  and  minor  modifications  in  production-- 
all  such  information  car.  contribute  to  the  development  of 
cost-estimating  relationsni ps.  Ihe  schedule  data  are  needed 
foe  price  adjustment  in  this  thesis,  for  example. 

2.  Qksitxattea.  of  biu  isi  ttea2.u2.2iiX 

Data  must  be  checked  to  ensure  that  the  cost  changes 
reflect  only  changes  in  tie  selected  explanatory  variables. 
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If  changes  have  occured  from  period  to  period  in  technology, 
skills  of  the  labor  force,  or  the  pcice  level  of  inputs,  the 
cost  measurement  will  be  ai  amalgam  of  the  change  in  output 
and  the  changes  in  design  characteristics  in  the  environ¬ 
ment.  Thus  the  cost  data  will  not  be  homogeneous  from 
observation  to  observation.  Nonhomogeneous  observations 
often  result  from  technological  or  organizational  differ¬ 
ences  in  different  plant  or  producing  nearly  identical 
output.  In  order  to  wort  with  a  Large  number  of  observa¬ 
tions  covering  a  wide  range  of  output,  it  may  be  cecessary 
to  work  with  the  cost  data  of  many  similar  departments.  If 
the  nature  of  operations  of  the  department  varies,  the 
behavior  of  the  costs  will  reflect  this  diversity. 
Unadjusted  cost  data  should  not  be  used  if  thesa  differences 
are  significant.  Dne  solution  is  to  aggregate  the  data  above 
organizational  differences.  Another  is  to  add  aa  explanatory 
variable  that  measures  the  difference. 

3*  Selectf-Q  n  of  Independent  Variable  (s) 

This  step  is  analogous  to  tie  model  building  stage 
in  any  research  project.  While  the  cost  relationship  will 
usually  be  simple,  involving  only  a  few  independent  varia¬ 
bles,  it  must  be  hypothesized  before  the  analysis  can  be 
carried  any  further.  Saaerally,  we  should  choose  an  inde¬ 
pendent  variable  on  the  oasis  of  a  reasonable  belief  that 
some  relationship  exists  between  the  variable  and  the  cost 
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being  estimated.  The  variables  used  ia  the  estimating  should 


be  the  ones  that  exert  the  major  effect  on  the  cost 
observed.  Among  the  most  widely  employed  variables  are 
weight,  and  speed. 

B.  DATA  ADJUSTMENT 

There  are  three  kinds  of  data  adjustments. 

1.  Cost  Def  ifiitioa  Ad  j  i^stn  ents 

Different  contractor  accounting  practices  and  types 
of  contracts  are  the  primary  reasons  for  this  type  of 
adjustment.  An  analyst  should  state  the  cost  definition  that 
h9  wishes  to  use  and  then  adjust  the  data  to  meet  his  defi¬ 
nition.  It  is  sometimes  impossible  to  obta'in  information 
needed  for  consistent  adjustments.  Interpretation  of  the 
final  cost  estimate  should  make  allowances  for  this  possible 
source  of  anomalous  cost  behavior. 

2.  Eli  £2  k§124  Miaiuents 

It  is  all  too  apparent  that  inflation  changes  the 
purchasing  power  of  the  dollar  dramatically.  In  order  to 
compare  the  cost  of  a  system  purchased  in  1953  to  the  cost 
of  a  new  system,  the  co3t  figures  must  be  adjusted  to  "cons¬ 
tant"  dollars.  The  Bureau  of  Labor  Statistics  publishes 
many  indices  that  can  be  used  for  thi.3  purpose.  With  suffi¬ 
cient  data,  it  is  possible  to  produce  a  weighted  index 
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r ABLE  I 
Price  Index 


Price  Index  Base 

-  1983 

tear 

Index 

1972 

39.79 

1973 

4  1.42 

1974 

4  5.09 

197$ 

51.17 

1976 

5  4.95 

1977 

53.73 

1978 

52.  95 

1979 

5  3.80 

1980 

79.79 

1981 

35. 23 

1982 

93.  20 

1983 

133.00 

1984 

135. 21 

Source:  OSD  (CO«?rSOLLER)  1982 


specifically  for  the  type  of  system  bai 
can  be  a  very  laborious  process  and 
indices  are  available  for  use.  The  pr 


n9  igstisated*  Th^s 

so  several  general 
'oducers  price  index 


(PPI) 


is  most  useful 


various  appropriation 


for  constricting  indices 
iccounts  ised  by  the 


the 

ary 
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services  (Bef.  3:  p.  24],  The  Department  of  Defense  also 
publishes  a  procurement  index  to  be  used  for  general  mili¬ 
tary  hardware.  Best  results  ace  obtained  from  indexes  which 
are  specialized  to  the  type  of  equipment  being  estimated. 
It  is  almost  an  impossibilty  to  obtain  an  index  that  will 
reiove  all  of  the  price  level  changes  of  a  particular  item. 
Table  I  gives  the  index  leaded  to  adjust  th9  missile  costs 
in  this  thesis. 

3.  Cc§t  Quantity  &iju jtmeqt s 

The  "learning  curve"  is  a  phenomenon  prevalent  in 
many  industries.  As  the  cumulative  nuaber  of  identical  items 
produced  doubles,  the  unit  cost  or  a  cumulative  average  cost 
is  reduced  by  a  constant  percentage  showing  "learning". 

Learning  curve  information  can  be  obtained  from  two 
possible  sources.  The  best  source  is  the  contractors  cost 
records  cr  CIR-reports  for  individual  units.  Costs  of  the 
units  are  plotted  and  a  line  is  fitted  to  the  plotted  data. 
A  second  source  of  information  would  be  a  general  industry¬ 
wide  learning  rate  that  may  be  published  in  the  industry's 
lit  erature. 

If  a  general  learning  rata  is  available,  say  90  5, 
along  with  the  cost  of  a  particular  unit  (say  unit  #5),  the 
curve  can  be  drawn  by  computing  the  cost  for  unit  *10  (unit 
*5  cost  times  .9),  plotting  tie  two  points,  and  drawing  a 
line  connecting  the  points  on  log-log  paper.  The  assumed 
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learning  curve  can  contribute  large  dollar  errors  if  the 
assumed  rate  used  is  not  accurate.  For  example  a  +1  * 
learning  rate  error  in  tie  erample  above  gives  a  3.261  % 
difference  over  40  u"vt3.  Quantity  adjustment  will  be 
discussed  for  the  missile  data  in  Chapter  IV. 
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Cost  estimation  relationships  (CES)  are  developed  from 
the  historical  cost  of  like  systems  and  the  parameters 
(e.g.,  weight,  maximum  speed,  range.)  of  these  systems. 

Statistical  amalysis  can  help  provide  an  understanding 
of  factors  that  influence  cost,  but  estimating  relationships 
are  not  a  substutite  for  understanding:  regression  analysis, 
which  will  be  discussed  in  this  thesis,  does  not  offer  a 
quick  and  easy  solution  to  all  the  problems  of  estimating 
cost.  The  outstanding  characteristic  of  a  C2R  is  that  the 
relationship  between  cost  and  explanatory  variable  is  direct 
and  obvious;  thus,  cost  per  kg  (or  pound)  is  widely  used 
because  cf  the  generally  satisfying  thesis  that  as  missile, 
tank,  or  airplane  increases  in  weight  it  becomes  more 
costly.  Weight  changes  alone  do  not  always  adequately 
explain  cost  changes,  and  additional  explanatory  variables 
are  often  needed.  The  problem  is  to  find  these  variables  and 
their  relationship  to  cost.  The  producer  is  to  decide  what 
variables  are  logically  or  theoretically  related  to  cost  and 
then  to  look  for  the  patterns  in  the  data  that  suggest  a 
relationship  between  cost  aad  the  variables.  Table  II 
contains  a  set  of  data  on  cost  and  selected  variables  tha* 
can  be  analyzed  for  such  patterns.  The  costs  of  twenty  two 
missiles  sets  are  given  with  weight,  speed,  and  range  of 
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TABLE  II 


Twenty  -  two  Missile  Sets 


MISSILES  HEIGHT  HEIGHT  SPEED  RANGE  TOM  A?E  COST 


(kg)  (kg) 


La  unch 

Pay  Load 

AGS 

- 

86B 

1, 426. 1 

108.9 

HIM 

- 

7  2C 

83.9 

12.7 

FGM 

- 

7  7A 

11.5 

2.5 

BGM 

- 

109C 

1,  22  4. 7 

- 

AGM 

- 

8  8  A 

35  3. 8 

5  3.2 

RGS 

- 

84  A 

630.  0 

231  .3 

AGM 

- 

1  14A 

44 . 7 

9  .  1 

MIS 

- 

23B 

623. 7 

5  0  .7 

MGM 

- 

52C 

1  ,  285.5 

210.9 

AGM 

65A/B 

215.  4 

53.9 

AGM 

- 

6  5D 

215.  4 

53.9 

MIS 

- 

104 

902.7 

90  .7 

PERSH 

ING-II 

•4  ,  600.  9 

294.8 

AIM 

- 

54  A 

446.  8 

59.9 

AIM 

- 

5*C 

45  3. 6 

59.9 

MIS 

- 

1  15 

63.5 

5.9 

AIM 

- 

9L/M 

86.  2 

11.3 

AIM 

- 

7F/M 

231.  3 

40.8 

RIM 

- 

67B 

1  ,  358.9 

51.2 

RIM 

- 

66C 

640. 0 

51.2 

FIM 

- 

9  2  A 

15.7 

0.9 

PGM 

— 

109A/B 

1, 224. 7 

453  .6 

(maoht 

N  ail® 

1983  ($  Mill) 
(1,000th) 

0.7 

1350.0 

1.9330 

2 . 5 

2.6 

0.1039 

0.  3 

0.5 

0.0503 

0.7 

2000.0 

3.  1826 

3.5 

10.0 

0.8632 

0.3 

60.0 

1. 1668 

1.0 

3.8 

0.  1496 

2.5 

25.0 

0.7952 

3.0 

75.0 

1.0599 

1.0 

65.0 

0. 1554 

1.0 

65.0 

0.3596 

3.0 

37.0 

3.1805 

3.0 

1000.0 

3.3653 

5.0 

76.0 

1. 2326 

5.0 

10  0.0 

1.2512 

1.5 

5.0 

1. 1975 

2.5 

1.9 

0.1329 

2.5 

24.0 

0.5271 

3.0 

69.0 

0.9878 

3.0 

40.0 

0.6146 

2.0 

3.0 

0.1275 

0.7 

300.0 

2.5616 

each.  It  is  to  be  expected  that  cost  would  increase  with 
weight  or  with  speed  or  range. 

A  graphic  analysis  of  the  data  in  Table  II  shows  that 
cost  is  not  a  simple  Linear  function  of  any  a f  the  three 
expalnatory  variables.  lost  tends  to  increase  with  we.ght, 
but  there  are  notable  exceptions  to  the  trends,  as  illus¬ 
trated  by  the  scatter  diagram  of  Fig.  3.1.  Cost  is  plotted 
against  speed  and  range  as  shown  in  Fig.  3.2  and  Fig.  3.3. 
At  this  point,  it  is  not  clear  if  any  of  the  explanatory 
variables,  either  singly  or  in  comoination,  will  yield  a 
useful  estimating  relationship. 

To  illustrate  technigues  that  are  commonly  employed  in 
deriving  estimating  relationships,  assume  that  cost  can  be 


Pigure  3.1  Scatter  Diagram  of  cost  vs  Height  for  Data 


Pigure  3.3 


Scatter  Diagraa  of  Coot  vs  Ranga  for  Data 


related  to  a  single  predictive  var iable--that  of  weight. 
The  results  of  a  simple  liuear  regression  model  will  then  be 
examined.  Later,  several  explanatory  variables  in  multiple 
regression  analysis  will  be  considered. 

The  statistical  technique  normally  applied  ::  developing 
CEfis  from  historical  cost  and  paraaeteric  data  is  called 
regression  analysis.  Regression  analysis  is  primarily 
concerned  with  the  determination  of  the  equation  cf  a  line 
or  curve  which  will  predirt  how  one  variable  (e.g.,  cost) 
will  vary  with  respect  to  some  parameter  (°.g.,  weight). 

Regression  has  become  a  widely  accepted  tool  for  cost 
analysis  and  it  is  frequently  used  to  develop  estimating 
relationship.  The  technique  of  regression  analysis  car.  be 
thought  cf  as  consisting  of  two  distinct  stages.  The  firs* 
is  that  of  estimating  the  constant  and  coefficients  of  the 
equation,  and  the  second  is  that  of  inferring  the  reli¬ 
ability  and  significance  of  the  resiles  of  the  estimate  on 
the  basis  of  assumed  (and  to  a  degree  verifiable)  properties 
possessed  by  the  data  and  the  results.  Regression  analysis 
as  ?.  technique  is  applicable  onLy  to  the  two  stages 
performed  together.  Estimating  coefficients  or  curve 
fitting  is  simply  a  mataeaatical  exercise.  Only  when  these 
estimating  procedures  are  1 sed  as  a  oasis  for  making  statis¬ 
tical  inferences  'an  they  be  viewed  as  part  of  a  statis¬ 
tical  analysis.  Before  performing  recession  analysis, 
guidelines  had  to  be  estaolishe  for  deter.  .ng  what  const- 
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gu idelines 


itutes  satisfactory  regression  criteria.  The 
established  for  this  missile  study  war?  as  follows: 

1.  The  interaction  between  the  iaper.dent  cost  variable 
and  the  independent  variables  (time-index  value,  etc.) 
is  such  that  changes  in  ths  latter  will  ganerata 
reasonable  changes  li  tha  former. 

2.  The  number  of  variables  will  be  limited  to  three  or 
less  because  of  the  limited  sample  sime. 

3.  Gccd  statistical  parameters  sum  as  low  (2D  percent  or 
less)  coefficient  of  variation,  significant  coeffi¬ 
cient  t-test,  small  standard  error  of  estimate,  etc. 
are  achieved  [Ref.  4:  p.  10]. 

4.  The  relationships  should  be  as  applicable  as  possible 
to  missile  systems  beyond  tha  oarformar.ee  range  of  tha 
sample  data;  i.e.,  future  systems. 

A.  SIMPLE  LINEAR  REGRESSION 


Scatter 

diagrams  of  the 

1st  "to 

eoretical" 

unit  versus 

the  ICCOfh 

unit 

cost  versus 

the  ti 

as-index 

value  of  the 

lODOth  unit 

were 

plotted  ana 

analyze 

d.  3ut 

most  analysts 

usually  choose  the  1000th  unit  as  a  batter  projection  quan¬ 
tity  than  the  1st  unit;  tha  1030th  uoit  is  the  standard  unit 
usad  in  this  thesis.  The  form  of  the  relationship  between 
cost  and  the  explanatory  variable  (s)  depends  upon  the 
problem.  If  may  reflect  an  underlyiag  physical  form  that  is 
suspected.  For  physical  o  hancteristics,  a  simple  linear 
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model  is  frequently  used  to  describe  the  relationship 
between  two  variables.  In  this  case,  the  equation  of  the 
model  is 

y  *  a  ♦  b  x# 

where  y  is  the  dependant  variable  and  x  is  the  explana¬ 
tory  variable.  The  systaa  a  ani  b  are  the  constant  and 
coefficient,  r e spectivaly,  of  the  aquation  estimated  from 
the  data.  Here  y  could  reprasent  tha  procurement  cost  of 
missiles  and  x  could  reprasent  the  waiqht.  If  it  is  assumed 
that  b  is  greater  than  zero,  tha  model  indicates  that 
heavier  equipment  will  cost  more  than  lighter  equipment. 
Whan  the  values  of  a  ani  b  are  known,  it  is  possible  to 
estimate  (cost)  for  any  given  value  of  x  (weignt)  . 

1.  Le&st -.squares  Sg^ia^tiaq 

Given  aquation  y  =  a  ♦  b  x,  the  oasis  problem 
in  the  first  phase  of  tha  regrassioa  analysis  is  to  deriva 
estimates  of  the  parameters  a  and  b.  The  standard  procedure 
is  the  method  of  laast-squa res.  Tha  values  of  a  ar.d  b  are 
determined  by  the  requirement  that  the  sum  of  tha  squared 
deviations  of  the  sample  obsarvations  from  tha  estimated 
line  will  be  minimum.  Symbolically,  This  minimum  is 
expressed  as 

n  .  2 

min  l  (  y  -  y  )  / 
i  i 

where  y  is  the  ith  observation  ani  y  is  tha  value  of  v 
i  i  i 
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estimated  from  the  equation 

•  •  • 

y  =  a  ♦  b  x  ■» 

i  i 

*  •  «  « 

The  dots  over  a  and  b  Indicate  that  a  and  b  are  least- 

squares  estimates  of  the  true  but  unknown  values  of  a  and  b. 

Thus  y^^  is  the  least-squac  es  astiiate  of  y  and  the  term 

(yi  -  y  )  indicates  the  difference  between  each  observed 

yi  and  the  corresponding  estimated  value  y  .  Figure  3.3 

below  contains  the  outcoae  of  a  least-squares  regression 

performed  on  the  data  in  Table  II.  The  equation  of  the 

illustrated  regression  lias  is  : 

y  =  0.  5  353  ♦  0.  72  89  U 

Ar  analyst  who  obtained  such  a  model  should  concerned  vrth 

the  question:  How  well  does  the  equation  tit  the  data? 

There  are  several  statistical  aeasures  that  car.  give 

indication  of  the  ability  of  the  model  to  describe  the 

data.  The  most  commonly  used  measure  of  the  "goodness  of 

fit"  of  the  regression  equation  is  the  coefficient  of 

2 

determination  (r  ). 

2  Explained  variation 

r  a - - - - — -  a  51.3  i 

Total  variation 

The  coefficient  of  determination  is  the  percentage  of  the 
variation  in  the  data  explained  by  the  regression  model. 

Ideally  an  analyst  would  want  r  to  approach  1.00. 
The  remaining  variation  may  be  explained  when  other 
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COST  ($  MILLIONS) 


WEIGHT  (KG) 


Figure  3.4  Hegresaion  LLae  and  Staadard  Error  of  Estiaate 
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variables  are  considered  and  brought  into  the  equation. 
Figure.  3.4  also  has  plotted  on  it  the  lines  representing 
the  standard  error  of  estimate.  The  greater  the  dispersion 
of  the  observed  values  of  cost  about  the  regression  line, 
the  less  accurate  the  estimates  that  are  based  on  that  line 
are  likely  to  be.  If  the  cost  data  follows  a  social  distri¬ 
bution,  approximately  68^  of  the  data  points  should  fall  in 
tha  area  bounded  by  the  two  standard  error  lines.  The  stan¬ 
dard  error  of  regression  is  a  measure  of  the  dispersion  of 
the  data  and  defined  as  the  square  root  of  the  unexplained 
variance : 


This  value  of  SE  has  bean  plotted  above  and  below  the 
regression  line  in  Figure.  3.4.  The  interpretation  and 
significance  of  these  cesjlts  will  be  discussed  in  connec¬ 
tion  with  the  use  of  prediction  intervals. 

In  comparing  one  3E  with  another,  it  is  useful  to 
compute  a  relative  standard  error  of  estimate.  The  coeffi¬ 
cient  of  variance  (CV)  is  such  a  measure  which  relates  the 
standard  error  of  the  model  to  the  mean  value  of  the 
deDendent  variable.  A  value  of  ID  to  20  percent  for  the  CV 
is  desireable  [Ref.  Is  p.  44],  The  standard  error  of  the 
model  presented  above  is  $  0.7547  millions,  and  the  coeffi¬ 
cient  of  variance  is:  [8ef.  1:  p.  44] 


3.E 


C V  «  -  *  0.5  563 

y 

y  *  mean  value  of  the  dependent  variable. 

This  0.6563  value  of  the  27  also  serves  as  indication  that 
the  proposed  aodel  is  not  veil  suited  to  the  data. 

2.  Statist iai  IfiliSiSSt 

Statistical  inference  may  be  used  to  answer  the  two 
questions  that  arise  in  connection  with  the  problem  of  reli¬ 
ability.  To  decide  whether  x  and  y  are  related,  test  for 
statistical  significance;  to  evaluate  predictions,  estabi- 
lish  a  prediction  interval  for  the  regression  line.  However, 
certain  assumptions  and  conditions  must  be  met  before  stan- 
dard  techniques  cf  statistical  inference  ar.d  testing  can  be 
validly  applied  to  ieast-sg ua res  results;  namely,  the  data 
are  assumed  to  be  a  sample  taken  from  a  laroer  population, 
which  meet  the  following  conditions: 

1.  The  x  values  are  nonrandom  (fixed)  variables. 

2.  The  residual  deviations  are  independent  random  varia¬ 
bles  with  normal  distributions. 

3.  The  expected  value  of  the  distribution  of  each  of 
these  random  variables  is  zero,  ar.d  the  unknown  vari¬ 
ance  is  the  same  for  all  values  of  x. 

Under  these  assumptions,  the  hypothesized  relationship 
between  y  and  x  becomes: 
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I  »  a  ♦  b  x  k  u  , 

i  i 

where  i  *  ( 1,. . . . ,  n>  , 

u  =v  the  normally  distributed  random  error  term 

4 

with  zero  expected  value  and  a  common  and 
unknown  variance. 

Further,  under  these  assumptions,  the  least-squares  method 
produces  unbiased  maximum  likelihood  estimators.  Standard 
statistical  techniques  can  be  applied  to  the  Least-squares 
results  to  test  for  significance  and  to  make  inferences 
about  reliability  ar.d  accuracy  in  a  probabilistic  sense. 
Although  the  subject  of  statistical  testing  is  too  complex 
to  treat  comprehensively  here,  the  method  of  testing  the 
significance  of  the  relationship  between  x  and  y  in  the 
simple  regression  of  Figure  3.4  will  be  examined  briefly. 
Basically,  the  procedure  involves  establishing  the  null 
hypothesis  that  x  ard  y  are  not  related  (i.a. ,  that  b=3)  , 
and  testing  to  determine  whether  tie  hypothesis  should  be 
rejected.  The  test  that  is  commonly  U3ed  for  this  purpose  is 
known  as  the  t-test  because  it  uses  the  t-ratio,  or  ratio  of 
a  coefficient  to  its  standard  error.  For  this  simple 
regression,  the  ratio  is  expressed  ae 


t  * 
b 


s 

b 


5.13 
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where  b  =  the  estimated  regression  coefficient  (from 

•  •  • 

the  quatior.  y  =  a  ♦  b  x)  , 

s  *  The  standard  error  of  b, 
b 

S  E 

/  *  2 
EU  *  JC) 

V  i 

S  E  =  The  standard  error  of  regression. 

A  standard  table  of  t-ratios  is  required  to  use  t-ratio 
equation#  to  test  the  null  hypothesis.  If  the  calculated 
value  of  t  falls  below  the  appropriate  value  of  t  selected 
from  this  table,  the  null  hypothesis  that  b  =  D  would  be 
accepted,  and  it  would  be  concluded  that  b  is,  in  fact,  not 
significantly  different  from  zero.  Ths  level  of  significance 
indicates  the  probability  that  ths  null  hypothesis  will  be 
rejected  when  it  is  true.  If  there  were  evidence  to  justify 
the  assumption  that  the  sign  of  the  coefficient  could  be 
only  positive  (or  only  negative)  if  it  were  different  from 
zero,  the  level  of  significance  associated  with  each  t  could 
be  read  directly  from  Stident's  t  Trizical  Points  Table. 
However,  tha  common  practice  in  regression  analysis  is  not 
to  make  this  assumption,  but  to  test  as  though  the  value  of 
t  (if  it  were  different  from  zero)  could  be  either  positive 
or  negative.  because  of  the  distribution  of  the  t-ratios, 
tha  level  of  significance  for  the  two-sided  test  is  twice 
the  level  of  significance  for  the  one-sided  test.  Thus,  the 
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levels  of  significance  of  the  t-values  shows  in  the 
Student's  t  Critical  Points  Table  ace  only  half  the  actual 
levels  for  the  two-sided  test. 

The  question  at  this  point  is,  what  should  the  level 
of  significance  be  foe  rejecting  the  hypothesis  ? 
Unfortunately,  no  simple  answer  is  possible.  The  values  of 
.10,  .05,  and  .01  are  those  that  ace  >ost  commonly  used,  but 
the  analyst  must  make  a  decision  based  on  the  cisk  that  is 
assumed  when  a  true  hypothesis  is  rejected.  For  this 
missile  data  no  reasonable  level  would  fail  to  reject  the 
hypothesis  that  b  =  0. 


3.  Predict  i  cn  Intervals 

The  procedure  for  calculation  of  the  prediction 
interval  for  a  simple  regression  is  as  follows.  For  a  given 
value  of  the  explanatory  variable,  say  x,  the  estimating 
equation  is  used  to  obtain  a  predicted  value  of  the  depen¬ 
dent  variable: 

•  •  • 
y  =  a  ♦  b  x 

The  prediction  interval  puts  a  boundary  around  y; 

r  *  *  c 

c/2 

There  is  a  certain  level  of  confidence  (1  -  e  )  tha*  the 
cost  of  a  set  weighing  x  will  be  in  that  interval.  Values 
foe  e/2  rather  than  e  are  used  since  y  is  to  be  bounded  on 
both  sides.  The  value  of  e  can  be  divided  oy  two  since 
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under  the  assumptions,  the  probability  distribution  about  y 
is  normal  and  therefore  is  symmetrical.  Iq  statistical 
terminology,  a  two-tailed  t  distribution  for  constructing 
the  intervals  is  used.  In  the  case  of  simple  regression  a 
103  (1  -e)  -  percent  prediction  interval  for  an  estimated 
value  of  the  dependent  variable  can  be  constructed  as 
follows:  tBef*  1:  p.  513 


V2 


where 


n  ♦>  1 


(S  Z)  t 


'/2 


4 


-  2 

(x  -  x) 

-  2 

r  (x  -  x) 
i 


~/2 


and  where  S  2  *  the  standard  error  of  the  estimating 

equation  from  whioa  y  was  obtained, 

=  The  value  obtained  from  a  table  of  t- 
values  for  the  £/2  significance  level, 
3  the  size  cf  the  sample, 

3  the  specified  value  cf  the  explanatory 

* 

variable  used  a3  a  basis  for  obtaining  y, 

*  the  mean  of  the  x’s  in  the  sample, 

*  the  sum  of  the  squared  deviations  of  the 
sampLe  x’s  from  their  sample  mean. 

This  prediction  interval  procedure  can  be  repeated  for  many 
values  cf  and  results  plotted  to  obtain  a  90-percent  pred¬ 
iction  interval  band  around  the  regression  line,  as  shown  in 


n 

x 


x 

-  2 

£(X  -  X) 

i 
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COST  ($  MILLIONS) 


WEIGHT  (KG) 


Pigare  3.5  The  90-pec=ent  Pradictinn  Interval  Band  for 
Estimated  Based  on  Saapla  Data 


hmhUi 
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Figure.  3.5.  In  this,  the  93-percent  confidence  region  is 
fairly  wide  because  of  the  relatively  large  standard  error 
of  this  equation.  The  formula  for  tha  prediction  interval  is 
such  that  the  weight  of  the  interveL  is  sensitive  to  the 
size  of  the  standard  error  ;  large  standard  errors  indicate 
that  ouch  of  the  cost  varitation  in  the  observed  data  is 
unexplained  by  the  equation. 

The  prediction  interval  becomes  wider  as  values  of  x 
farther  from  the  mean  of  the  sample  are  selected.  This 
change  ir.  the  size  of  the  prediction  interval  occurs  because 
the  formulas  are  derived  to  allow  for  the  possibility  that 
the  estimated  values  of  a  a  nd  b  differ  from  tne  true  values 
of  a  and  b.  Such  a  situation  can  occur  when  the  sample  data 
contain  chance  fluctuations  that  prevent  the  data  from 
reflecting  the  true  relationship  that  exists  in  the  total 
population  or  when  there  are  not  sufficient  data  in  the 
sample.  The  width  of  the  prediction  interval  is  also 
sensitive  to  the  level  of  confidence  that  is  specified  and 
to  the  number  of  degrees  of  freedom.  This  change  will  make  a 
difference  in  the  width  of  the  prediction  interval.  However, 
the  difference  in  prediction  interval  size  because  of 
difference  in  degrees  of  freedom  is  more  significant  for 
small  samples  than  fcr  large  samples;  the  value  of  t  for  any 
given  level  of  significance  becomes  almost  constant  for 
degrees  of  freedom  over  33. 
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B.  MULTIPLE  PEG  RES S  ION 


To  this  point,  simple  (one  explanatory  variable)  regres¬ 
sion  analysis  has  been  used  to  examine  the  linear  relation¬ 
ship  between  cost  and  weight.  With  the  array  of  data  shown 
in  Tabie  II  and  the  logarithmic  transformations  of  these 
data,  multiple  (more  than  one  explanatory  variaola)  regres¬ 
sion  analysis  will  now  be  examined.  This  section  covers  the 
multiple  linear  case.  Because  the  sample  documented  in  Tabie 
II  contains  only  twenty  two  observations,  the  examination 
will  be  limited  to  various  combinations  of  two  rather  than 
three  explanatory  variables.  If  additional  observations  were 
included  in  sample,  three  explanatory  variables  might  be 
considered  under  certain  ci rcumstance ;  however,  this  number 
cf  variables  used  with  n  observation;  would  detract  from  the 
credibility  of  the  result.  In  ar.y  event,  there  is  no  great 
loss  ir.  limiting  the  number  of  variables  to  two;  the  essen¬ 
tial  differences  between  simple  and  multiple  regression  can 
be  illustrated  with  the  t  w  c-ex  plan  a:  orv  variable  case.  In 
the  linear  case,  the  estimating  equation  is  of  general  form 

y  =  a  ♦  b  XI  ♦  c  X?.  ♦  d  X3 
The  results  for  each  of  the  possible  combinations  of  two 
from  the  set  of  four  xplanatcry  variables  are  as  follows; 
y  =  0.  5353  ♦  0.7289  W 

y  =  0.6378  ♦  0.8209  W  -  0.0993  S 

y  =  0.5098  «■  0.5239  W  *■  0.0009  S 

y  *  0.  3550  ♦  0.1353  S  ♦  0.00  19  H 


y 


0. 9826 


0.50  11  W  0.  01  64  S  ♦  0.  0008  3 


where 


y  *  cost  in  millions  of  dollars 
W  =  total  weight  in  g  (launch  ♦  pay  load) 

S  »  speed  in  mach 
R  =  range  in  miles 

Tc  understand  the  use  of  t-ratios  in  multiple  regression 
equations,  the  meaning  of  the  multiple  regression  coeffi¬ 
cients  must  be  understood.  In  each  case,  the  multiple 
regression  coefficient  shows  the  nee  effect  of  an  explana¬ 
tory  variable.  For  example,  the  above  equation  car,  be 
interpreted  as  fellows:  For  a  given  speed,  range,  a  1 -kg 

increase  in  total  weight  will  cause  a  $  500  increase  ir. 

cost. 

As  the  degree  of  interdependence  between  explanatory 
variable  increases,  regression  results  become  less  stable 
ar.d  mere  indet  e  rminant.  As  a  con  sequer.c-,  the  t-ratio 
should  not  be  the  sole  test  for  assessing  the  amount  of 
interdependence  present.  Further,  it  is  not  possible  to 
give  a  precise  cutoff  point  at  whicn  explanatory  variables 
must  always  be  considered  too  interdependent.  A  correlation 
coefficient  of  0.9  or  more  between  explanatory  variables 
will  almost  certainly  cause  problems;  one  of  3.3  or  less 
usually  will  not.  [Bef.  1:  p.  68].  The  array  of  corela¬ 
tions  among  the  explanatory  variables  should  always  be 
examined  in  the  stages  of  analysis,  and  to  the  extent 
possible,  the  use  of  interdependent  explanatory  variable- 
should  be  avoided. 
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The  question  irises,  for  cost-estimating  purpose,  is  the 
nultiple  regression  with  speed  and  range  preferable  to  the 
siaple  regression  with  weight  as  the  explanatory  variable? 
To  find  an  answer,  the  other  measuces  by  which  the  regres¬ 
sion  equations  are  judged  must  be  compared:  the  standard 
error  of  regression,  the  coefficient  of  variation,  and  the 
coefficient  of  determination.  These  are  shown  in  Table  III 
for  each  of  the  multiple  regression  for  comparison  with  the 
•results  obtained  from  the  simple  regression.  The  primary 
concern  in  this  comparisioa  is  between  the  multiple  regres¬ 
sion  with  speed  and  rang?  and  tne  simple  regression  with 
weight,  since  the  speed  and  range  equation  is  the  only  one 
in  which  bo-h  explanatory  variables  are  significant. 


Tfc  BLS  III 


Comparison  of  Multiple-linear  with  Simple-linear  Regression 

R  esul  ts 


Explanatory  variables 


W 

«  6  S 

W  5  R 

S  &  R 

W  &  s  6 

SE 

0.7547 

0. 7505 

0. 65  6  0 

0.7525 

0.6338 

CV 

0.6563 

0.6605 

0.  5361 

0.6622 

0. 5578 

r 

0.513 

0.548 

0.644 

0.546 

0.645 

DF 

21 

19 

19 

19 

18 

4) 


The  equation  above,  in  which  weight  and  speed  are  used, 
appears  to  give  slightly  better  results  in  a  comparision 
with  the  other  measures.  However,  the  coefficient  of  the 
speed  variable  is  not  significant  at  the  10-percent  level. 
As  a  consequence,  the  improvement  is  net  a  statistically 
significant  one.  The  generalized  test  to  determine  whether 
the  incremental  improvement  associated  with  the  addition  of 
a  variable  is  significant  uses  an  ’-statistic.  The  test 
performed  with  this  statistic  is  similar  to  the  t-test.  In 
this  case,  the  null  hypothesis  is  that  the  increment  is  not 
significant.  The  statistic  used  to  test  this  .null 
hypothesis  is 

Increment  of  explained  variance/  degree  of  freedom 

F  - - 

Remaining  unexplained  variance/  degree  of  freedom 

This  can  be  rewritten  as 


2  2 

(  R  -  r  )  /  1 


where 


(1  -  R  )  /  19 

»  the  coefficient  of  determination 


the 


equation  that  inclide  total  weight,  speed. 


r  *  the  coefficient  of  determination  of  the 
equation  with  total  weight  alone. 
Substituting  the  appropriate  coefficients  of  determination 
in  the  formula  for  the  F-statistic,  we  obtain 


(  0.543  -  0.  513  ) 

F  =  -  =  1.4712 

(  1  -0.548  )  /  19 
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which 


This  value  falls  short  of  the  critical  value  of  P, 
equals  3.01  at  the  10-percent  level  of  sigr.if icance.  This, 

the  null  hypothesis  is  accepted  [Ref.  5:  p.  282].  Anc  we 
conclude  that  the  net  increment  in  explained  variance  asso¬ 
ciated  with  the  addition  of  speed  to  the  equation  containing 
weight  is  insufficient  to  establish  that  the  improvement  is 
not  due  to  chance. 
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IV.  ia£  ItEAR.HI.lii  =U£VE 


The  learning  process  is  a  phenomenon  that  prevails  in 
many  industries;  its  existence  has  been  verified  by 
empirical  data  and  controlled  tests.  Although  there  are 
several  hypotheses  or.  the  exact  manner  in  which  the  learning 
or  cost  reduction  can  occur,  the  Dasis  of  learning-curve 
theory  is  that  each  time  the  total  quantity  of  items 
produced  doubles,  the  cost  per  item  is  reduced  to  a  constant 
percentage  of  its  previous  cost.  For  example,  if  the  cost 
of  producing  the  200th  unit  of  an  item  is  80  percent  of  the 
cost  producing  the  100th  item,  and  if  the  cost  of  the  400th 
unit  is  60  percent  of  the  cost  of  tae  200th,  and  so  forth, 
the  production  process  is  said  to  follow  an  80-percent  unit 
learning  curve.  If  the  average  cost  of  producing  ail  200th 
units  is  80  percent  of  the  average  cost  of  producing  the 
first  100th  units,  the  process  follows  an  80-percent 
cumulative  average  learning  curve.  There  are  many  factors 
which  contribute  to  the  learning  ourve.  These  are  all 
interrelated  and,  in  general,  no  oia  factor  can  be  said  to 
be  dominant  over  the  others.  The  principle  factors  are  as 
follows; 

1.  worker  efficiency 

2.  Method  and  processes 

3.  Total  production  quantity 
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4.  Type  of  product 

5.  Lot  buys 

5.  Tooling  concepts,  t33t  equipment 

The  above  list  of  relever.t  factors  is  not  complete,  and 
it  tends  to  understate  tha  importance  of  the  itao  sometimes 
considered  the  most  important— labor  learning. 

A.  THE  RELATIONSHIP  BETWEEN  COST  AND  QUANTITY 

The  relationship  between  cost  and  quantity  may  be  repre¬ 
sented  by  a  weight  equation  of  tha  form 

o 

y  =  a  X  , 

where  X  eguals  the  cumulativa  production  quantity.  Tha 
relationship  corresponds  to  a  unit  or  a  cumulative  average 
learning  curve  according  to  whether  y  is  the  cost  of  the  Xth 
unit  or  the  average  cost  of  tha  first  X  units.  The  constant 
a  is  tha  cost  of  the  first  unit  produced.  Tha  exponent  b, 
which  measures  the  slope  cf  the  learning  curve,  bears  a 
simple  relationship  to  the  constant  percentage  to  which  cost 
is  reduced  as  the  quantity  is  iouDlad.  If  S  represents  tha 
decimal  fraction  to  which  cost  decreases  when  quantity 
doubles,  the  equation  becomes 


b  * 


log  3 
log  2 


k2z  z  iiiisas  Uai.1 

If  a  production  procass  follows  a  unit  learning 
curve  of  the  fora  y  =  a  x,  tae  cumulative  cost  T  of 
producing  the  first  n  units  is 

n  b 

T  =  a  r  x  . 
x=  1 

Tha  cumulative  average  co3t  y  of  producing  the  first  n  units 
is  then, 

?  a  n  b 

y  *  J  X  . 

n  n  x-l 

2.  l£S:  linsai  QaiJilitiys  h.l*Zllz  curve 

When  a  production  procass  foLlovs  a  log-linsar  cumu¬ 
lative  average  curve  rathar  than  a  unit  curve,  the  basic 
functional  form  is  still  y  =  ax  but  can  bo  written 

yc  *  ax,  where  yc  is  the  average  cost  of  the  first  x 

units.  The  cumulative  coat  for  pcoiucing  x  units  is  simply 
fa+1 

yc  x,  cr  a  x  ,  and  the  unit  cost  i3  obtained  from  the 
function 

b*1  b*1 

a  {  x  -  (  x  -  1  )  }  . 

bt  1 

and  T  3  a  X  . 
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B.  APPLICATIONS 


The  learning  curve  is  used  for  a  variety  of  purposes  and 
in  a  variety  of  contexts;  how  the  curve  is  drawn  will  depend 
on  the  purpose  and  the  context.  In  long-range  planning 
studies,  for  example,  the  curve  must  be  constructed  on  the 
basis  of  generalized  historical  data,  and  the  possible  error 
is  considerable.  Empirical  evidence  does  net  support  the 
concept  of  a  single  slope  for  all  soLid  propellant  missiles, 
all  fighter  aircrafts,  or  ail  spacecraft.  Therefore,  the 
practice  of  assuming  that  manuf  acturing  hours  or.  tne 
airframe  will  follow  an  80-percent  curve  (as  was  common  for 
many  years)  or  that  electronic  equipment  will  follow,  say  a 
90-percer.t  curve,  can  lead  to  very  large  estimating  errors. 
For  estimating  to  be  effective,  therefore,  the  learning 
curve  must  be  established  on  the  basis  of  historical  data 
relevent  to  the  specific  problems.  Such  curves  are  equally 
applicable  to  missiles,  electronic  equipment,  aircraft, 
ships  and  other  types  of  equipment,  but  the  slopes  may  be 
different  for  each  of  these. 

With  a  small  sample  of  data,  where  a  learning  curve  is 
fitted  to  a  few  points,  the  correlation  may  be  perfect, 
i.e.,  all  the  points  may  lie  on  the  fitted  line,  but  the 
results  can  still  be  unreliable.  The  points  used  in  fitting 
must  be  sufficiently  numerous  and  reasonably  homogeneous 
with  the  points  implied  oy  extending  the  curve  to  offer  a 
reasonable  probability  of  3ucess  in  predicting  costs. 


Whatever  the  basic  technique,  it  is  important  to 
remember  that  on  logarithmic  grids  the  points  at  the  right 
are  usually  more  important  than  those  that  at  the  left.  In 
visually  fitting  a  lice,  the  analyst  should  avoid  the 
tendency  to  be  unduly  influenced  by  plot  points  for  early 
lots.  Early  units  are  oftsi  incomplete  because  they  are  used 
for  test  purposes.  It  is  equally  possible  that  early  units 
will  include  certain  nonrecurring  problems  incident  to 
startup  and  for  this  reason  may  be  aoove  the  level  suggested 
by  latter  plot  points. 

C.  EXAMPLE  OF  LEARNING  C3  a  YE  FROM  B3DGET  DATA 

Often  the  only  data  available  on  a  regular  public  basis 
is  the  budget  data.  This  is  data  Co:  total  cost  by  year  and 
quantity.  Although  this  is  not  laoor  cost  alone  it  can  be 
used  for  estimating  purposes.  It  must  be  adjusted  to 
similar  quantities  by  the  learning  curve.  The  data  in  this 
thesis  came  from  (J.S.  Missile  Data  Book  [Ref.  6]  and  O.S. 
Weipcn  Systems  Costs  [Ref.  7].  From  these  scuuses  ar.d  table 
I  for  price  adjustment,  the  data  for  Table  II  were  obtained. 
As  an  example,  the  data  for  RIM-573  are  shown  in  Table  IV 
and  the  learning  curve  is  plotted  in  Figure  4.1  and  the 
calculations  are  shown  in  Appendex  A.  All  of  data  for 
missiles  in  Table  II  were  processed  in  a  similiar  way  and 
the  estimate  of  the  cumulative  avenge  cost  of  1000th  unit 
wer9  obtained. 
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r ABLE  IV 


Plot  Point  o£  RIM  -  57B 


YEAR 

QTY 

COM  QTY 

CUM  AVE  : 

1976 

22 

22 

4.4421 

1977 

36 

38 

3.0591 

1978 

40 

3  8 

2.6325 

1979 

40 

138 

2. 4300 

1980 

35 

1  3  3 

2.0713 

1981 

2  65 

45  8 

1.2077 

1982 

3  75 

34  3 

0.9544 

1983 

3  75 

121  8 

0.9158 

1984 

4  50 

1558 

0.8643 

Source: 

U.  S  .  H 

issile  Data  3onk. 

1982 
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V. 


CON~LUS£Dtf 


This  thesis  has  outline!  some  of  the  better-known 
methods  that  are  used  to  develop  cost  equations.  These 
estimating  techniques  range  in  difficulty  from  the  simple 
(Simple  regression)  to  the  more  difficult  (Multiple 
regression).  The  latter  nay  require  the  combined  talents  of 
a  statist  lean ,  engineer  and  accountant.  Statistical 
techniques  are  generally  justified  when  the  estimates  are  to 
be  used  in  recurring  decisions.  Tie  expense  involved  in 
gathering  and  analyzing  tie  data  for  multiple  regression  is 
not  usually  justified  if  the  estimate  of  the  cost  equation 
is  to  be  used  for  only  a  single  decision.  However  the 
missile  budgeted  data  are  available  so  that  ZSRs  for  this 
area  are  practical. 

There  are  some  difficulties  in  using  the  statistical 
method  for  this  type  cf  study.  First,  there  is  the  basic 
problem  of  obtaining  a  sufficient  number  of  observations  to 
support  the  distribution  assumptions  and  to  reduce  the 
standard  error.  The  variance  of  the  error  term  is  usually 
not  known  and  must  be  estimated  by  the  standard  error.  The 
confidence  or  prediction  intervals  depend  on  this  measures 
and  will  be  quite  wide  if  the  standard  error  of  the 
estimating  equation  is  large.  The  error  might  bs  reduced  as 
the  number  of  observations  increases.  Similarly,  the 
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confidence  intervals  are  dependent  on  the  range  of  ths 
observed  values  of  the  independent  variable  x.  They  will  be 
relatively  wide  again  if  tie  range  is  limited  and  now  point 
is  outside  the  range  of  independent  variable.  The  number  of 
observations  can  sometimes  be  increased  by  using  additional 
time  periods  for  which  cost  observations  are  available. 

I  also  would  like  to  raise  soae  questions  about  the 
validity  of  using  the  least  squares  criterion  as  a  basis  for 
cost  estimation.  The  "least  squares”  estimate  minimizes  the 
sum  of  the  deviations  of  actual  cost  observations  from  their 
estimates.  By  its  vary  construction  it  imputes  a 
disproportionate  weight  to  tha  influence  of  larger 
deviations  compared  to  smaller  ones.  This  leads  to  the 
so-called  "outlier"  problem  such  as  83M-109C  and  KIM-104. 
In  collecting  cost  observations  to  be  included  in  the 
calculation  of  the  parameter  values,  there  is  a  tendency  to 
discard  those  observations  that  seem  to  lie  outside  a  normal 
trend  line  in  erder  to  remove  a  possible  bias  in  the 
estimating  equation.  That  is,  their  inclusion  will  cause  the 
estimated  cost  line  to  tilt  upward  or  downward  in  order  to 
reduce  the  squared  deviations  between  these  observations  and 
their  estimates.  The  assumption  i3  that  outliers  are  merely 
unusual  occurences  and  therefore  should  not  be  used  to 
derive  estimates  of  the  normal  relationship  between  cost 
item  and  some  explanatory  variable.  However,  so-called 
outliers  may  reflect  something  mors  basic.  For  example, 
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observations  that  depart  from  the-  normal  trend  line  at  the 


extreme  ends  of  the  range  of  actvity  used  in  the  ana: 
may  reflect  a  nonlinear  cost  relationship  between  the 
item  and  the  explanatory  variable. 

With  these  difficulties  in  statistical  method,  in 
thesis  I  have  introduced  the  estimating  of  the  cost  of 
missiles.  It  is  worthwhile  to  further  research 
methodology  of  cost  estimating  for  military  hardware. 
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EX&SELS 

0?  C^LgOLAr 

IONS  FOR  £3E  LEARNI^j  ^Ogil 

PY 

COW  QTY 

TC 

:0W  A72  COST 

1976 

22 

53.7 

(  53.700/  22) /0. 5495  =  4.4421 

1977 

58 

46.8 

(  104.  205/  58)  /0.  5873  =  3.0551 

1978 

98 

50.7 

(  152.409/  98) /0. 6295  =  2.6326 

1979 

138 

53.2 

(  23  0. 71  2/1  3  3)  /0. 6880  =  2.4300 

1980 

193 

50.8 

(  3  1  4. 965/  1  93)  /0 .7  879  =  2.07  13 

1981 

468 

142.8 

(  437.372/469) /0. 8623  =  1.2077 

1982 

643 

223.4 

(  750.633/343) /0. 9320  =  0.9544 

1983 

1218 

310.0 

(1 11 5.435/1218) /I. 000  =  0.9158 

1984 

1668 

347.3 

(144  2.41  8/1  668) /I.  000  *  0.  3648 
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***** 


UllSikS  CLASSIFICATION 


Missiles  are  classified  by  the  general  characteristic 
grouping  or  designators.  Appendix  B  is  a  cross  reference 
listing  by  designator. 


These  grouping  or  designators  say  show  in  what  manner  a 
missile  is  used,  but  they  will  not  identify  a  particular 
missile.  This  general  classification  makes  use  of  three 
items:  launch  environment,  target  environment  (or  mission), 
and  type  of  vehicle. 


The  first.  letter  is  used  to  designate  the  launch 
environment,  which  may  Oe  air,  ground,  underground,  or 
underwater.  Thus  the  letters  are  "A"  for  air,  "G"  for 
ground,  "L"  for  undergrouid  or  silo  launched,  and  "U"  for 
underwater.  The  second  letter  is  used  to  designate  the 
target  environment  or  mission.  This  letter  may  be  "I"  for 
interceptor,  M5*'  for  surface  target,  or  "Q"  for  drone.  The 
third  letter  designates  the  type  vehicle  as  "M"  for  missile, 
or  ,,P”  for  rocket.  An  example  of  this  general  classification 
is  illustrated: 


rOMAH AWK 


y  B  G  M  -  109  A 


Status  Prefix  < -  . 

(Prototype) 

Launch  Environment  <-■ 
(Multiple) 


- >  Moaif rcatron 


-->Desian  Number 


Mission  < -  ->  Vehicle  Type 

(Surface  Attack)  (Guided  Missile) 
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DESIGNATORS 

MISSILES  NAME 

AIM  -  7P/M 

SPARROW  III 

AIM  -  9L/M 

S IDEWINDER 

HIM  -2  3B 

IMPROVED  HAWK 

MGM  -5  2C 

LANCE 

AIM  -54A 

PHOENIX 

AIM  -54C 

PHOENIX 

AGM  ™6  5A 

MAVERICK  (EO) 

AGM  -6  5D 

MAVERICK  (HR) 

HIM  -6  6C 

STANDARD  II  MR 

RIM  -67B 

STANDARD  II  ER 

MIM  -7  2C 

CHAPARRAL 

FGM  -77a 

DRAGON 

RGM  -8  4A 

HARPOON 

AGM  -86B 

A  LCM 

AGM  -88A 

HARM 

FIM  -9  2A 

STINGER 

MIM  -104 

PATRIOT 

BGM  -109C 

3  LCM 

BGM  -109A/B 

TOMAHAWK 

AGM  -  1  14 A 

HELLFIRZ 

MIM  -1  15 

ROLAND  II 

_ 

P'ERSHING  II 
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hZlZMUL  2 
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First 

latter 

Title 

Description 

A 

Air 

Launched  from  aircraft  while  in 

f igh  t. 

B 

Multiple 

Capable  of  Dsing  launched  from 

incca  than  one  an vironmemt . 

C 

Coffin 

Horizontally  stored  in  a  protective 

enclosure  and  launchei  from  the 

ground. 

F 

Individ  ual 

Carried  by  one  man 

H 

Silc  Stored 

Vertically  stored  below  ground  levs! 

and  launched  from  the  ground. 

L 

Silo  Launched 

Vertically  stored  and  launched  from 

below  ground  level. 

M 

Mo  bile 

Launched  fron  a  ground  vehicle  or 

aoveable  platform. 

P 

Soft  Pad 

Partially  or  nonprotected  in  storeage 

and  launched  from  the  ground. 

R 

Ship 

Launchei  from  a  surface  vessel  such 

as  a  ship,  barge,  etc. 

0 

Underwater 

Launched  froi  a  submarine  or  other 

underwater  device. 
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Underwater 


Vehicles  desigaed  to  iastoty  enemy 


attack 

u  Heather 


submarine  or  other  underwater  tagets. 

Vehicles  designed  to  observe,  record, 
or  ralay  data  pertaining  to  meteorol¬ 
ogical  phenomena. 
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Third 

Letter  Title 
M  Guided  Missile 


N  Probe 


Description 

As  the  third  letter  in  a  missile 
des ignator, it  identifies  an  unmann¬ 
ed,  self  prefelled  vehicle.  Such  a 
vehicle  is  designed  to  move  in  a 
trajectory  which  may  be  entirely  or 
partially  above  the  earth's  surface. 
While  ir.  motion  this  vehicle  can  be 
controlled  remotely,  by  homing  sys- 
reas,  or  by  inertial  and/or  program¬ 
med  guidence  from  within.  The  term 
"guided  missile"  does  not  include 
space  vehicles,  space  boosters,  or 
naval  torpedoes,  but  it  ices  not  in¬ 
clude  target  and  reconnaissance 
dr o  nes. 

The  letter  '' N"  is  used  to  indicate 
noaorbital  instrumented  vehicles 
which  are  not  involved  in  space 
missions.  These  vehicles  are  used  to 
penetrate  the  space  environment  and 
transit  or  report  bacK  information. 
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This  identifies  a 


self-  propelled 


vaiicla  witaoat  installed  or  reacts 
controll  guidance  aechaaisas.  Ones 
laanchad,  tha  trajectory  or  flight 
path  of  surh  a  vehicle  cannot  bs 
cha  nged . 
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